Goto

Collaborating Authors

 right question


From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

Zhou, Zhanke, Feng, Xiao, Zhu, Zhaocheng, Yao, Jiangchao, Koyejo, Sanmi, Han, Bo

arXiv.org Artificial Intelligence

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast, active reasoning-where an LLM must interact with external systems to acquire missing evidence or data-has received little systematic attention. To address this shortfall, we present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM's active reasoning skills. AR-Bench comprises three task families-detective cases, situation puzzles, and guessing numbers-that together simulate real-world, agentic scenarios and measure performance across commonsense, logical, and symbolic reasoning challenges. Empirical evaluation on AR-Bench demonstrates that contemporary LLMs exhibit pronounced difficulties with active reasoning: they frequently fail to acquire or leverage the information needed to solve tasks. This gap highlights a stark divergence between their passive and active reasoning abilities. Moreover, ablation studies indicate that even advanced strategies, such as tree-based searching or post-training approaches, yield only modest gains and fall short of the levels required for real-world deployment. Collectively, these findings highlight the critical need to advance methodology for active reasoning, e.g., incorporating interactive learning, real-time feedback loops, and environment-aware objectives for training. The benchmark is publicly available at: https://github.com/tmlr-group/AR-Bench.


QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?

Li, Belinda Z., Kim, Been, Wang, Zi

arXiv.org Artificial Intelligence

Recently, a large amount of work has focused on improving large language models' (LLMs') performance on reasoning benchmarks such as math and logic. However, past work has largely assumed that tasks are well-defined. In the real world, queries to LLMs are often underspecified, only solvable through acquiring missing information. We formalize this as a constraint satisfaction problem (CSP) with missing variable assignments. Using a special case of this formalism where only one necessary variable assignment is missing, we can rigorously evaluate an LLM's ability to identify the minimal necessary question to ask and quantify axes of difficulty levels for each problem. We present QuestBench, a set of underspecified reasoning tasks solvable by asking at most one question, which includes: (1) Logic-Q: Logical reasoning tasks with one missing proposition, (2) Planning-Q: PDDL planning problems with initial states that are partially-observed, (3) GSM-Q: Human-annotated grade school math problems with one missing variable assignment, and (4) GSME-Q: a version of GSM-Q where word problems are translated into equations by human annotators. The LLM is tasked with selecting the correct clarification question(s) from a list of options. While state-of-the-art models excel at GSM-Q and GSME-Q, their accuracy is only 40-50% on Logic-Q and Planning-Q. Analysis demonstrates that the ability to solve well-specified reasoning problems may not be sufficient for success on our benchmark: models have difficulty identifying the right question to ask, even when they can solve the fully specified version of the problem. Furthermore, in the Planning-Q domain, LLMs tend not to hedge, even when explicitly presented with the option to predict ``not sure.'' This highlights the need for deeper investigation into models' information acquisition capabilities.


A Complete Guide on Data Science & Analytics for Businesses

#artificialintelligence

In simpler terms, it means utilizing the powers that be of machine learning and artificial intelligence for performance improvement. The aim is to activate and increase the scope of automation in hitherto redundant data management systems. AI will be asked to step in as an aid to humans rather than displacing their overlords. Many companies have already started leveraging AI in its information systems. For instance, JobGet, in order to make the job seeking process time-friendly through its application took Appinventiv's assistance in implementing the AI technology. With the use of this innovative technology, our team integrated the functionality of finding employers and employees on the app on the basis of location. This way, employees could connect with employers who are near their vicinity, thus eliminating the excess travel time.


The cult of AI, design is governance, Figma's career levels, UX fatigue

#artificialintelligence

Sabbath mode and assistive technology features "There's a secret mode that comes with almost all large ovens, refrigerators, dishwashers, and other large kitchen appliances. It is called Sabbath mode, and there is a very specific reason it is provided by the manufacturer." WYSIWYGPT "[ChatGPT] caused an avalanche of news items and blog posts calling'game over' for web developers and that soon all the work to build web products will be done by machines. Any person will be able to do that job -- all you need to do is to ask the chatbot the right questions." Advertising folk, it's time to rethink what we're selling "Every time we step on a bus or open a social media app, the invisible hand of our industry is there to build demand for new products and services that might make life better. And with up to 10,000 ads floating into our consciousness every day, how can we resist?"


The art of asking the right questions - Big Think

#artificialintelligence

Many people, myself included, can find asking questions to be daunting. It fills us with worry and self-doubt, as though the act of being inquisitive is an all-too-public admission of our ignorance. Unfortunately, this can also lead us to find solace in answers -- no matter how shaky our understanding of the facts may be -- rather than risk looking stupid in front of others or even to ourselves. But once upon a time, we were all questing-asking savants. We started grilling our parents as toddlers, and by preschool, our epistemic inquiries plumbed the depths of science, philosophy, and the social order.


Council Post: Implementing AI? You'd Better Think About Security First

#artificialintelligence

Stephanie is the Chief Security Technology Strategist at Intel. Artificial intelligence (AI) seems to be everywhere these days, from marketing programs to diagnostic laboratories. It's now increasingly common to build a custom AI model or buy commercial offerings powered by AI. But before you set that AI loose in the world and into your core business, make sure you understand the potential security pitfalls and take steps for responsible adoption of AI. Machine learning (ML) is the most common form of AI and is the process of training a machine to make future predictions based on historical data.


No code, no problem--we try to beat an AI at its own game with new tools

#artificialintelligence

Over the past year, machine learning and artificial intelligence technology have made significant strides. Specialized algorithms, including OpenAI's DALL-E, have demonstrated the ability to generate images based on text prompts with increasing canniness. Natural language processing (NLP) systems have grown closer to approximating human writing and text. And some people even think that an AI has attained sentience. And as Ars' Matt Ford recently pointed out here, artificial intelligence may be artificial, but it's not "intelligence"--and it certainly isn't magic.


AI is transforming medicine: Here's how we make sure it works for everyone

#artificialintelligence

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. What if your doctor could instantly test dozens of different treatments to discover the perfect one for your body, your health and your values? In my lab at Stanford University School of Medicine, we are working on artificial intelligence (AI) technology to create a "digital twin": a virtual representation of you based on your medical history, genetic profile, age, ethnicity, and a host of other factors like whether you smoke and how much you exercise. If you're sick, the AI can test out treatment options on this computerized twin, running through countless different scenarios to predict which interventions will be most effective. Instead of choosing a treatment regimen based on what works for the average person, your doctor can develop a plan based on what works for you.


The Continuous Evolution of Artificial Intelligence in Our Society

#artificialintelligence

Artificial intelligence is changing the modern workplace, raising important questions for our society. Everybody knows artificial intelligence (AI) is meant to bring a huge competitive edge to those who successfully adapt it. The challenge, however, is to identify what makes AI adaptation truly successful. In the past few years, we've seen many technology fads come and go. Maybe it didn't bring enough value.


Why Are So Many Data Scientists Quitting Their Jobs? - KDnuggets

#artificialintelligence

When I first started learning data science, I assumed that landing a job in the field meant that the hard part was over. After a few years of working in the industry, however, I have come to realize that I couldn't have been more wrong. Many data scientists I know have left their jobs in just months after landing the position. I quit a data science internship one week after I joined, since I felt as though the tasks I was assigned had nothing to do with all the skills I'd painstakingly learnt. After speaking to co-workers in the data industry who like me, had left their jobs at a very early stage in their career, I've come to realize that there are two main reasons the data science field has such a high employee attrition rate: You spend thousands of hours learning statistics and the nuances of different machine learning algorithms.